HealDA Integration by aayushg55 · Pull Request #1356 · NVIDIA/physicsnemo

aayushg55 · 2026-01-28T19:43:50Z

PhysicsNeMo Pull Request

Description

This integrates the HealDA AI data assimilation model training and inference pipelines into PhysicsNemo. This includes the following in the examples/weather/healda directory:

UFS Replay Observation dataset ETL
HealDA training recipe
End-to-end forecasting example using HealDA initial conditions with FourCastNet3

The HealDA Observation Embedding and DiT architectures are integrated at physicsnemo/models/healda.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

…gration

- deleted old normalizations for IR - corrected normalization script for new parquet format, deleting separate conv file - unified conv QC filtering ranges to sensors.py - added end-to-end forecast -> scoring -> plotting with FCN3

…gration

aayushg55 · 2026-01-28T21:23:41Z

@greptileai

greptile-apps · 2026-01-28T21:27:39Z

Greptile Overview

Greptile Summary

This PR integrates the HealDA (Hierarchical Lattice Data Assimilation) AI model into PhysicsNeMo, adding training and inference pipelines for weather forecasting with observation data assimilation.

Major Changes:

Added DiT (Diffusion Transformer) model architecture in physicsnemo/models/healda/
Implemented HEALPix-based patch embedding and decoding layers
Added multi-sensor observation embedding system for satellite and conventional weather observations
Included UFS Replay Observation dataset ETL pipeline and training recipes
Added end-to-end forecasting example integrating HealDA with FourCastNet3

Critical Issues Found:

MOD-001 Violation (MUST FIX): All main model classes (DiT, HPXPatchEmbed, HPXPatchDecode, MultiSensorObsEmbedding, ObsDecoder) inherit from torch.nn.Module instead of physicsnemo.Module. This violates the coding standard and prevents these models from benefiting from PhysicsNeMo's serialization, versioning, and registry features.
MOD-003 Violations: Multiple docstring formatting issues including missing raw string prefix r""", incomplete docstring sections (missing Parameters, Forward, Outputs), and improper NumPy style formatting.
Missing Comprehensive Tests: While basic smoke tests exist, the models lack the required constructor/attributes tests (MOD-008a), non-regression tests with reference data (MOD-008b), and checkpoint loading tests (MOD-008c).
Unused Code: self.silu is initialized but never used in AdaLayerNormZero and AdaLayerNormTemporalAttn classes.

Recommendations:

Fix MOD-001 violations by changing all model base classes to inherit from physicsnemo.Module
Add proper super().__init__(meta=...) calls with appropriate metadata
Complete docstring sections following MOD-003 guidelines
Add comprehensive test coverage per MOD-008 requirements before moving out of experimental status

Important Files Changed

Filename	Overview
physicsnemo/models/healda/dit.py	adds DiT transformer model for data assimilation, but violates MOD-001 (inherits from torch.nn.Module instead of physicsnemo.Module) and has docstring formatting issues
physicsnemo/models/healda/healpix_layers.py	adds HEALPix patch embedding/decoding layers, but violates MOD-001 and MOD-003c (missing proper docstring sections)
physicsnemo/models/healda/obs_embedding/point_embed.py	implements multi-sensor observation embedding, but violates MOD-001 (inherits from torch.nn.Module)
physicsnemo/models/healda/obs_embedding/decoder.py	adds observation decoder from latent representations, but violates MOD-001
test/models/healda/test_dit.py	basic smoke tests for DiT model instantiation and forward pass, but lacks comprehensive tests required by MOD-008a, MOD-008b, MOD-008c

greptile-apps

_{5 files reviewed, 9 comments}

_{Edit Code Review Agent Settings | Greptile}

coreyjadams

Hi @aayushg55 ,

There are some things to address in this PR before we can review it properly. I haven't looked at the logic of the code, I'll let a subject matter expert review.

The code in physicsnemo/models/healda is not aligned with the standards we're moving to for physicsnemo v2.0. All of this feedback is arising because physicsnemo was becoming very fragmented, circular, and unmanageable with such large PRs. We're attempting to improve the user and developer experience by reducing some of this duplication, coding practice violations, etc. Your PR is coming as we're starting to really enforce this stuff - sorry. But we'll have to get the model implementation up to spec before merging.

Some specific things I have found looking quickly, though certainly not everything:

model code must enter physicsnemo through the experimental folder. We already have a DiT in the experimental folder, and if they can be combined to maximize overlap and minimize code repetition we should do that.
We also have already some substantial amount of healpix and earth2grid layers / work. We can't duplicate here - please extend the existing layers, as needed, but repetition isn't maintainable.
There is a lot of tooling in the models folder that does not belong there. profiling.py and types.py come to mind as utilities. sharding looks like it might be domain-parallel specific but I don't know what it's doing. We have a whole suite of distributed and domain parallel tooling, perhaps those should be there.
bare earth2grid imports in physicsnemo are not allowed.
We have a number of embeddings already that seem very similar to some of the ones in embedding.py. Let's not duplicate.
avoid importing from a subdirectory (obs_embedding) into a higher directory like you do, it's just a circular import waiting to happen.
you have files missing license headers in places.

We have a pre-commit system in physicsnemo that should have caught a lot of these - did you try it?

I think this PR has a ways to go before proper review can start. Would you like to convert it to draft and we can help you? We can schedule a meeting with someone on the physicsnemo team to give you some guidance.

All the best,
Corey

aayushg55 · 2026-01-28T22:58:41Z

Thanks @coreyjadams for taking a look. I agree this needs significant work and refactoring for a proper integration with physicsnemo, and this PR was mainly intended to start the public release process.

I’ll mark this as a draft for now. I believe @NickGeneva will be helping me bring this into better shape.

Re: pre-commit — Yes, I had run the pre-commit hooks and resolved the linting/license issues it flagged.

nbren12 · 2026-01-29T01:30:10Z

Keep in mind that we need to maintain checkpoint compatibility since this checkpoint has been heavily validated and released via a publication. Some of the concerns about apparent code duplication need to be weighed against that. Refactoring and changing state dict names is fine, but these should not change the answer. 100s of person hours are invested in this checkpoint, so we shouldn’t be too rigid.

The hpx embedding layer strikes me as a risky component to refactor, and also a fairly simple one which shouldn’t increase maintenance burden much. Not sure if you have a hpx layer yet.

- removed ObsDecoder/Sharding/Profiling/SubDomain - HpxPatch Embed/Decode subclass the existing DiT tokenizers - refactored the noise+condition embedding into module to be compatible - moved HealDA to experimental - added DropPath to DiT - breaking changes for DropOut in DiT MlpLayer and TE Attn backend (proj_out) - moved HealDA model config classes to examples, but retain sensor related configs - flattend obs_embedding subdir

Made-with: Cursor

nbren12 · 2026-03-17T23:32:52Z

+    This allows converting map-style datasets to iterable-style. It loads data
+    from the dataloaders in a round-robin manner, removing exhausted dataloaders
+    from rotation until ALL dataloaders are exhausted.
+


Suggested change

When combined with the `sampler` argument to DataLoader, this provides explicit control over the order in which an individual multiprocessing worker accesses samples, which is useful for optimizing read performance on chunked datasets.

nbren12 · 2026-03-17T23:36:53Z

@@ -0,0 +1,194 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2026 NVIDIA CORPORATION & AFFILIATES.


Would be nice to either see these data loading utilities moved into the core package (even if only as a hidden api or dropped in favor of re-chunking the data as a pre-processing step)

Overall, I think these are useful since they allow similar optimizations as DALI (e.g. preloading onto gpu in a separate thread) without having to use DALI

nbren12 · 2026-03-19T16:26:23Z

BTW can we leave the src/healda package structure? Would be nice to keep the overall structure the same as our other code base, to make updating the recipe simpler

aayushg55 and others added 15 commits January 26, 2026 00:17

Add HealDA to PhysicsNeMo subtree

f5a53ef

Add healda integration to physicsnemo

380434a

delete file

dbf153a

Fix license headers

fc97759

more linting

1321d06

more linting

ef4111c

more linting

3ac1ad9

arxiv link

87e0716

readme cleanup

3ec5ee8

skip cuda tests when unavailable

70d610f

Merge commit '66aa539ef6df4e7f9388aecee9fecfff46f8676f' into pnm-inte…

e505a28

…gration

Cleanup to ETL and new FCN3 forecasting example

53b1144

- deleted old normalizations for IR - corrected normalization script for new parquet format, deleting separate conv file - unified conv QC filtering ranges to sensors.py - added end-to-end forecast -> scoring -> plotting with FCN3

update readme

4928fc9

Fixed training and inference pipelines

13f377d

Merge commit 'bfe710511f88ce6c78e8b1ca6900ee073535ec46' into pnm-inte…

7bc5ae0

…gration

aayushg55 marked this pull request as ready for review January 28, 2026 19:54

Remove acc scoring and other unneeded files

ea0b4f3

greptile-apps Bot reviewed Jan 28, 2026

View reviewed changes

coreyjadams requested changes Jan 28, 2026

View reviewed changes

aayushg55 marked this pull request as draft January 28, 2026 22:58

NickGeneva self-assigned this Jan 29, 2026

aayushg55 added 5 commits February 2, 2026 22:01

Merge branch 'main' of github.com:NVIDIA/physicsnemo into healda

fc57101

HealDA working with timm backend, debugging TE errors

e6178a9

fixed te issue - qkv_layout difference

0ad266c

Merge branch 'main' of github.com:NVIDIA/physicsnemo into healda

d5aa7a0

aayushg55 added 21 commits February 3, 2026 22:41

cleanup documentation

c11ac9c

Merge branch 'main' of github.com:NVIDIA/physicsnemo into healda

2ff6109

delete old tests

62d0c38

update healda readme with ckpt/e2studio links

19bd5fa

update readme

7a296a7

update license headers

1755f50

Made-with: Cursor

update inference

d3394ec

update base

b27151c

update readme

e54de49

cleanup model config

da9f14d

cleanup dataset

34ba868

update license

96cdef8

update transform and loading

cc1ad80

update model setup to use new pnm model

d400fcf

use pnm checkpointing

fd741ec

Merge branch 'main' of github.com:NVIDIA/physicsnemo into healda

24bb714

fix license headers

7e227d9

fix batch keys

71eff55

ensure conv gets 1 platform

e43eeec

update model config

2d70390

reduce logging

aa63f5a

nbren12 reviewed Mar 17, 2026

View reviewed changes


	When combined with the `sampler` argument to DataLoader, this provides explicit control over the order in which an individual multiprocessing worker accesses samples, which is useful for optimizing read performance on chunked datasets.

		@@ -0,0 +1,194 @@
		# SPDX-FileCopyrightText: Copyright (c) 2023 - 2026 NVIDIA CORPORATION & AFFILIATES.

Conversation

aayushg55 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

aayushg55 commented Jan 28, 2026

Uh oh!

greptile-apps Bot commented Jan 28, 2026

Greptile Overview

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

aayushg55 commented Jan 28, 2026

Uh oh!

nbren12 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbren12 Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbren12 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

nbren12 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aayushg55 commented Jan 28, 2026 •

edited

Loading

nbren12 commented Jan 29, 2026 •

edited

Loading

nbren12 Mar 17, 2026 •

edited

Loading